This assignment is for ETC5521 Assignment 1 by Team taipan comprising of Helen Evangelina and Yiwen Jiang.
This report presents the findings of the woodland caribou between 1988 to 2016 following the tracking data conducted under B.C. Ministry of Environment & Climate Change. This report mainly analyses the changes in the number of woodland caribou, and other analyses include the habitats changes caused by seasonal differences, the effects of the implementation of management plans and the causes of tag deployment ended. In the following section, we will describe the data set, where the data came from, and what is the data prepared for. The data description also includes how we transform and clean the raw data for analysis. Our statistical programming used for analysis is R and Rstudio.
Caribou are the only large herbivore that is widely distributed in the high-elevation habitat and act as agents for plant and lichen diversity through the mechanisms of trampling and foraging. The Caribou has also been a significant resource for indigenous peoples for millennia (Environment (2014)). The survival rate of the Caribou is generally relatively low due to predation by Canis Lupus (wolf). The Caribou listed as “vulnerable” on the International Union for Conservation of Nature (IUCN) Red List. With the Caribou being listed as “Threatened”, it is essential to monitor the number of the Caribou as monitoring is vital to effective conservation. We will represent our findings in this report through the exploration of the Caribou tracking data.
The tracking data was collected by B.C. Ministry of Environment & Climate Change over 28 years (1988 - 2016), the data was prepared for the study of management and recovery of the caribou (BC Ministry of Environment (2014)). It includes the information of 286 Caribou and covered 250,000 locations.
Figure 1.1: Visualise the missing value in individuals data
There are several limitations associated with this dataset. The most noticeable thing about this dataset is that there are a lot of NAs which causes limited analysis. We observe that in the individuals data over half of the values are missing (Refer to Figure 1.1). Not many analysis can be performed because by removing the NAs, there would not be sufficient data left to be analysed and insufficient data would lead to inaccurate result. For example, in the pregnant variable, there are 93.36% of the values missing.
The Caribou has a low reproductive rate due to females only have one calf per year, and females do not reproduce until they are two years old. To analysis, the sex ratio should be a good indicator of the trend of the number of Caribou. However, there are only five males Caribou out of 286. The analysis result will exist bias when we use the sex ratio as an indicator.
Another limitation is deploy_off_type mainly consists of “unknown”, indicating improper records. This would not lead to accurate analysis. We wanted to see if the equipment failure on the deploy_off_type is many to see if the equipment are working properly or not - and if not, then the quality of the equipment should be improved. However, it turned out that there isn’t any equipment failure value in the deploy_off_type. Same thing with the death_cause which consists many “unknown” values. Other than that, there is inconsistent naming of the values inside the dead_cause variable.
The dataset tracks woodland caribou in northern British Columbia, published by the Movebank Data Repository at _https://www.datarepository.movebank.org/handle/10255/move.955_. This data was collected by putting trackers of almost 250,000 location tags on 286 Caribou, from 1988 to 2016, which was accessed through Movebank.
The boreal woodland caribou, also known as woodland caribou, boreal forest caribou and forest-dwelling caribou, is a North American subspecies of the reindeer with the vast majority of animals in Canada. They prefer lichen-rich mature forests and mainly live in marshes, bogs, lakes and river regions. Caribou are considered as an ancient member of the deer family Cervidae (Banfield (1974)). They are smaller than Moose (Alces americanus) and Elk (Cervus canadensis), standing 1.0 - 1.2 meters high at the shoulder (Thomas (2002)). Due to the caribou is classified as “Vulnerable” on the International Union for the Conservation of Nature’s (IUCN) Red List. The data provided for the study of the B.C. Ministry of Environment & Climate Change to report the management and recovery of the caribou.
Because this data set is used for analysing the reproduction of species, the data is obtained by observation rather than experiment. There is no treatment group and the control group. The time frame of the collection was started in 1988 and the end of 2016. Movebank collected the locations data of individual animals over time by tracking the biologging sensors attached to animals (Kranstauber et al., 2011). The data sets were separated into two data files and provided by .csv format. The following are the variables in each data.
individual data comes from Mountain caribou in British Columbia-reference-data.csv. The data contains the relevant information of 286 caribou. The variables are showing in the Table 2.1.| Variable | Class | Description |
|---|---|---|
| animal_id | character | Individual identifier for animal |
| sex | character | Sex of animal |
| life_stage | character | Age class (in years) at beginning of deployment |
| pregnant | logical | Whether animal was pregnant at beginning of deployment |
| with_calf | logical | Whether animal had a calf at time of deployment |
| death_cause | character | Cause of death |
| study_site | character | Deployment site or colony, or a location-related group such as the herd or pack name |
| deploy_on_longitude | double | Longitude where animal was released at beginning of deployment |
| deploy_on_latitude | double | Latitude where animal was released at beginning of deployment |
| deploy_on_comments | character | Additional information about tag deployment |
| deploy_off_longitude | double | Longitude where deployment ended |
| deploy_off_latitude | double | Latitude where deployment ended |
| deploy_off_type | character | Classification of tag deployment end (see table below for full description |
| deploy_off_comments | character | Additional information about tag deployment end |
locations comes from Mountain caribou in British Columbia-gps.csv. The data contains location information of each counted caribous for every four fours. The variables are showing in the Table 2.2.| Variable | Class | Description |
|---|---|---|
| event_id | double | Identifier for an individual measurement |
| animal_id | character | Individual identifier for animal |
| study_site | character | Deployment site or colony, or a location-related group such as the herd or pack name |
| season | character | Season (Summer/Winter) at time of measurement |
| timestamp | datetime | Date and time of measurement |
| longitude | double | Longitude of measurement |
| latitude | double | Latitude of measurement |
The data being used is the dataset from the Science update for the South Peace Northern Caribou (Rangifer tarandus caribou pop. 15) in British Columbia available from Movebank (BC Ministry of Environment, 2014). The raw datasets are first read by using read_csv() function. It can be noticed from the raw datasets that the variable names use “-” instead of "_“. Using dash in a variable name might result in issues, as the valid variable name in R should consist of dot or underline characters. Another problem from this dataset is the values in the “animal-life-stage” consist of spacing, which might lead to issues as it is inconsistent. Another noticeable thing is the datasets have a lot of NA values. Therefore, the data needs to be cleaned by using the tidyverse and janitor libraries.
To clean the individuals data, firstly clean_names() function from the janitor package is used to return the data.frame with clean names. What this function does is changing the variable names into a tidier form. As mentioned before, using dash in variable names is not appropriate in R. Notice that the raw dataset has names like “deploy-off-latitude” which is changed into “deploy_off_latitude”. Next is to assigned the result to transmute(), which will compute new columns but will drop existing columns. This is done to make the variable names in a tidier way. The whitespace in the life stage is gotten rid to address inconsistent spacing by using str_remove_all() function. After tidying the variable names with transmuting, the “reproductive_condition” variable is separated into “pregnant” and “with_calf” by using the separate() function as this variable actually contains two dimensions, and then assigning those variables into new columns by using the mutate() function which consists of either TRUE or FALSE value.
The locations data is cleaned by using the same method as the individuals data, which includes cleaning the name first by using clean_names() function to arrive at a data.frame with clean names. The next step is to use transmute() function to compute new columns with dropping existing columns. After cleaning both datasets, the final datasets are written into csv format by using write_csv() function.
# Load libraries
library(tidyverse)
library(janitor)
# Import data
individuals_raw <- read_csv("./caribou-location-tracking/raw/Mountain caribou in British Columbia-reference-data.csv")
locations_raw <- read_csv("./caribou-location-tracking/raw/Mountain caribou in British Columbia-gps.csv")
# Clean individuals
individuals <- individuals_raw %>%
clean_names() %>%
transmute(animal_id,
sex = animal_sex,
# Getting rid of whitespace to address inconsistent spacing
# NOTE: life stage is as of the beginning of deployment
life_stage = str_remove_all(animal_life_stage, " "),
reproductive_condition = animal_reproductive_condition,
# Cause of death "cod" is embedded in a comment field
death_cause = str_remove(animal_death_comments, ".*cod "),
study_site,
deploy_on_longitude,
deploy_on_latitude,
# Renaming to maintain consistency "deploy_on_FIELD" and "deploy_off_FIELD"
deploy_on_comments = deployment_comments,
deploy_off_longitude,
deploy_off_latitude,
deploy_off_type = deployment_end_type,
deploy_off_comments = deployment_end_comments) %>%
# reproductive_condition actually has two dimensions
separate(reproductive_condition, into = c("pregnant", "with_calf"), sep = ";", fill = "left") %>%
mutate(pregnant = str_remove(pregnant, "pregnant: ?"),
with_calf = str_remove(with_calf, "with calf: ?")) %>%
# TRUE and FALSE are indicated by Yes/No or Y/N
mutate_at(vars(pregnant:with_calf), ~ case_when(str_detect(., "Y") ~ TRUE,
str_detect(., "N") ~ FALSE,
TRUE ~ NA))
# Clean locations
locations <- locations_raw %>%
clean_names() %>%
transmute(event_id,
animal_id = individual_local_identifier,
study_site = comments,
season = study_specific_measurement,
timestamp,
longitude = location_long,
latitude = location_lat)
# Write to CSV
write_csv(individuals, "./caribou-location-tracking/individuals.csv")
write_csv(locations, "./caribou-location-tracking/locations.csv")
This dataset is primarily used to analyse the changes in the number of caribou from 1988 to 2016 to observe the survival of the species. As the management came up with a plan, we would like to analyse whether the management plan is effective in increasing the number of caribou over time.
The primary question to answer from this dataset is how is the trend of the number of caribou over time?
From the primary question, we came up with four secondary questions, which are as follows:
- Do the habitats vary between summer and winter?
- How is the trend of the classification of tag deployment end (deploy_off_type)?
- Has the management plan increased the number of caribou?
To overview the survival of Caribou in different herds, we will first look at the changing on the number of Caribou which have tracked. We use the location data to represent the changing on the number of caribou in each herd. The location data are tracking the location information of each caribou for every four hours; the tracking conducted until tag deployment ended.
Figure 3.1: Monthly number of Caribou been tracked between 1988 to 2016
As showing in Figure 3.1, the volatility of the number of Caribou tracked are high. The lowest number from 1988 to 2016 is almost 10, but it increased to over fifty by less than five years. The pattern is quite different between 1992 to 2001, that is because there are no tracking records in those years. After 2010, the number of Caribou is decreasing gradually.
Due to the limitations of this plot, some of the information does not present. For example, what causes the number of the Caribou tracked decreasing, does the equipment failure or death of Caribou. If the reason if equipment failure, there is not reasonable to conclude the number of Caribou changed. We will go further to explore findings from the tracking data of Caribou. That include, do the habitats vary between summer and winter? How is the trend of the classification of tag deployment end (deploy_off_type)? Has the management plan increased the number of caribou?
Figure 3.2: Seasonal differences of habitats (Coloured by seasons)
The Caribou resides in the British Columbian within eight herd ranges: Hart Ranges, Graham, Moberly, Scott, Burnt Pine, Kennedy Siding, Quintette, and Narraway. Caribou’s habitats vary seasonally to obtain forage, cover, and avoidance from predators.
Typically, in winter, Caribou will select low-elevation forests or windswept alpine ridges where snow cover is relatively shallow to create and forage for terrestrial lichens. The habitats of different seasons have been coloured and showing in Figure @ref(fig:caribou_map), red and blue dots are the locations been recorded separately in summer and winter. Most of Caribou’s habitats overlapped between summer and winter. There still has some difference. In summer, the range of Caribou activities is more concentrated on the mountains, but in winter the range of Caribou activities is flatter.
Figure 3.3: Seasonal differences of habitats (Coloured by herds and separated by herds)
Then we separate the habitats with seasonality and coloured by the herds; it is present in 3.3. We can observe that some of the locations are the difference between seasons. Compared to summer, Caribou in the same herd are more concentrated in winter. In summer, most of the Caribou migrate towards the central core of the Rocky Mountains (high-elevation habitats) because wolves live primarily at low-elevation that spatial separation can help to avoid predators. This plot is still hard to analysis the difference of habitats for each herd because the habitats of some herds overlapped. For example, the dots where the Burnt Pine herd inhabits covered by the Kennedy herd.